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@ Draw processor for a high perfonmance three dimensional graphics accelerator. 



@ A draw processor for a graphics accelerator is 
disclosed that perfonns edgewalking and scan 
interpolation functions to render a three dimen- 
sional geometry object defined by a draw pack- 
et The draw processor renders a subset of 
pixels on a scan tine, such that a set draw 
processors taken together render the entire 
geometry object The draw processor renders 
pixels into an interieave bank of a multiple bank 
interleaved frame buffer. The draw processor 
also processes direct port data tiirough a direct 
port pipeline . 
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BACKGROUND OF THE INVENTION 

1. RELATED APPLICATIONS: 

5 This application Is a continuation-in-part of U.S. patent application serial no. 08/071.699 . filed on June 4, 

1993, entitfed An Architecture for a High Performance Three Dimensional Graphics Accelerator. 

2. FIELD OF THE INVENTION: 

10 This invention relates to the field of computer graphics systems. More particularly, this invention relates 
to a draw processor for a high performance three dimensional graphics accelerator in a computer system. 

3. ART BACKGROUND: 

15 A three dimensional graphics accelerator is a specialized graphics rendering sut)system for a computer 
system. Typically, an application program executing on a host processor of the computer system generates 
three dimensional geometry input data that defines three dimensional graphics elements for display on a dis- 
play device. The application program typically transfers the geometry input data from the host processor to 
the graphics accelerator. Thereafter, the graphics accelerator renders the corresponding graphics elements 

20 on the display device. 

The design architecture of a high performance three dimensional graphics system historically embodies 
a balance between system performance and system cost The typical design goal is to increase system per- 
fonmance while minimizing increases in system cost However, prior graphics systems usually suffer from either 
limited performance or high cost due to a variety of system constraints. 

25 For example, a high performance graphics system typically Implements an interleaved franrte buffer com- 
prised of multiple VRAM banks because the minimum read-modi fy-write cycle time for commercially available 
video random access memory (VRAM) chips is a fundamental constraint on rendering performance. The inrv 
ptementation of multiple interleaved VRAM banks enables parallel pixel rendering into the frame buffer to in- 
crease overall rendering performance. Unfortunately, the separate addressing k>gic required for each inter- 
so leave VRAM bank increases the cost and power consumption of such high performance systenns. 

On the other hand, a graphics system may implement a rendering processor on a single integrated circuit 
chip to minimize cost and power consumption. Unfortunately, such systems suffer from poor rendering per- 
formance due to the limited number of interface pins available with the single integrated circuit chip. The limited 
number of interface pins reduces the interleave factor for the frame buffer, thereby precluding the rendering 

35 perfonmance benefits of parallel processing. 

Prior graphics systems often employ a parallel processing pipeline to increase graphics processing per- 
formance. For example, the scan conversion function for a shaded triangle in a graphics system is typically 
performed by a linear pipeline of edgewaiking and scan interpolation. Typically in such systems, the edgewalk- 
ing function is perfonmed by an edgewaiking processor, and the scan interpolatk)n function is performed by a 

40 set of parallel scan interpolation processors that receive parameters from the edgewaiking processor. 

However, such systems fail to obtain parallel processing speed benefits when rendering relatively long thin 
triangles, which are commonty encountered in tessellated geometry. The parameter data flow between the 
edgewaiking processor and the scan interpolatton processors greatly increases when performing scan con- 
version on long thin triangles. Unfortunately, the increased parameter data flow slows triangle rendering and 

45 reduces graphics system perfbmfiance. 

SUIWiWARY OF THE INVENTION 

A draw processor for a graphics accelerator is disclosed that performs edgewaiking and scan interpolation 
50 functions through a three dimensional geometry pipeline to render a three dimensional geometry object and 
that perfonms pixel functions through a direct port pipeline. The draw processor renders a subset of pixels on 
a scan tine, such that a set draw processors taken together render the entire geometry object The draw proc- 
essor renders pixels and processes the pixel functions into an interleave bank of a multiple bank interleaved 
frame buffer. 

55 The draw processor comprises a geometry pipeline interface circuit that receives a draw packet over a 
draw bus from a floating-point processor, wherein the draw packet contains a set of geometry parameters that 
define a geometry object. The geometry pipeline interface circuit adjusts the geometry parameters according 
to an interleave value corresponding the draw processor. 



2 



EP0 631 252 A2 

The draw processor also comprises a rendering circuit that receives the geometry parameters from the 
geometry pipeline interface circuit and that generates a set of pixels corresponding to the geometry object 
by performing edgewalking and scan interpolation functions according to the geometry parameters. 

The draw processor also comprises a direct port interface circuit that receives a direct port packet over 
5 the draw bus from a command preprocessor. The direct port packet contains a set of pixel function parameters 
that control at least one pixel function of the draw processor. 

The draw processor also comprises a memory control circuit that receives the pbcels from the direct port 
interface circuit and the pixel function parameters from the direct port interface circuit The memory control 
circuit writes the pixels into a frame buffer memory while performing the pixel function. 

10 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram of a computer system including a host processor, a memory subsystem, a 
graphics accelerator, and a display device. 
15 Figure 2 is a block diagram of the graphics accelerator, which is comprised of a command preprocessor, 
a set of floating-point processors, a set of draw processors, a frame buffer, a post-processor, and a random 
access memory/digital-to-analog converter (RAMDAC). 

Figure 3 is a block diagram of the command preprocessor which shows the refbmnatting circuitry of the 
3D geometry pipeline, along with the direct port data pipeline. 
20 Figure 4 is a block diagram of a floating-point processor section, including a control store (CS), an input 
circuit, an output circuit, a register file, a set of functional units, a control circuit, and an SRAM interface circuit. 

Figure 5 is a block diagram of the draw processor which shows a 3D geometry pipeline interface circuit, 
a rendering circuit, a direct port bus interface circuit, and a memory control circuit 

Figure 6 illustrates the 3D geometry pipeline interface circuit which receives output geometry packets 
25 for the 3D geometry pipeline over the CD_BUS. 

Figure 7 lists the format of the 3D geometry pipeline commands contained in output geometry packets, 
and shows the double buffer registers (dbr1-dbr23) employed for each parameter in a commands packet 

Figure 8 shows the coding of the Z-Buffering control field for hklden surface removal (HSR) functions us- 
ing Z as a depth buffer. 

30 Figure 9 shows the coding of the Z-Buffering control field for window ID (WID) functions using Z as a win- 

dow ID extension. 

Figures 10 -13 illustrate the pbcel operations of the draw processor, and show the conditions tested by 
the draw processor prif^r to pixel writes to the frame buffer. 

35 DETAILED DESCRIPTION OF THE INVENTION 

An architecture for a high perfonmance three dimensional graphics accelerator in a computer system is 
disclosed. In the following description for purposes of explanation specific applications, numbers, apparatus, 
configurations and circuits are set forth in order to provide a thorough understanding of the present invention. 
40 However, it will be apparent to one skilled in the art that the present invention may be practiced without these 
specific details. In other instances well known systems are shown in diagrammatk^al or block diagram form in 
order not to obscure the present invention unnecessarily. 

Refening now to Figure 1, a block diagram of a computer system Is shown, including a host processor 20, 
a memory subsystem 22, a graphics accelerator 24, and a display device 26. The host processor 20. the mem- 
45 ory subsystem 22, and the graphics accelerator 24 are each coupled for communfcation over a host bus 28. 

The display device 26 represents a wide variety of raster display monitors. The host processor 20 repre- 
sents a wide variety of computer processors, multiprocessors and CPUs, and the memory subsystem 22 rep- 
resents a wkje variety of memory subsystems including random access memories and mass storage devices. 
The host bus 28 represents a wide variety of comnmjnication or host computer busses for communication be- 
50 tween host processors, CPUs, and memory subsystems, as well as specialized subsystems. 

The host processor 20 transfers information to and from the graphics accelerator 24 according to a pro- 
grammed input/output (I/O) protocol over the host bus 28. Also, the graphics accelerator 24 accesses the menv 
ory subsystem 22 according to a direct memory access (DMA) protocol. 

A graphics application program executing on the host processor 20 generates geometry data arrays con- 
55 taining three dimensional geometry information that define an image for display on the display device 26. The 
host processor 20 transfers the geometry data arrays to the memory subsystem 22: Thereafter, the graphics 
accelerator 24 reads in geometry data arrays using DMA access cycles over the host bus 28. Alternatively, 
the host processor 20 transfers the geometry data arrays to the graphics accelerator 24 with programmed I/O 
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over the host bus 28. 

The three dimensional geometry information in the geometry data arrays comprises a stream of input ver- 
tex packets containing vertex coordinates (vertices), vertex position, and other infomiatlon that defines trian- 
gles, vectors and points in a three dimensional space which is commonly referred to as model space. Each 
5 input vertex packet may contain any combination of three dimensronal vertex information, including vertex nor- 
mal, vertex'color, facet nonnal, facet color, texture map coordinates, pick-id's, headers and other information. 

A headerless input vertex packet may define a triangle strip in the form of a "zig zag" pattern of adjacent 
triangles. A headerless input vertex packet may also define a triangle strip in the fom of a "star strip" pattern 
of triangles. In addition, a headerless input vertex packet may define a strip of isolated triangles. An input vertex 
10 packet having a header may change triangle strip formats for each triangle and change between *^ig zag" for- 
mat, "star" format, and isolated triangles. 

Figure 2 is a block diagram of the graphics accelerator 24. The graphics accelerator 24 is comprised of a 
command preprocessor 30, a set of floating-point processors 40-43, a set of draw processors 50-54, a frame 
buffer 100, a post-processor 70 and a random access menwry/digital-to-analog converter (RAM DAC) 72. The 
IS RAMDAC 72 is similar to commercially available RAMDACs that implement look-up table functions. 

For one embodiment, the command preprocessor 30, the floating-point processors 40-43, the draw proc- 
essors 50-54, and the post- processor 70 are each individual integrated circuit chips. 

The command preprocessor 30 is coupled for communication over the host bus 28. The command prepro- 
cessor 30 performs DMA reads of the geometry data anrays from the menrrary subsystem 22 over the host 
20 bus 28. The host processor 20 transfers virtual menwry pointers to the command preprocessor 30. The virtual 
memory pointers point to the geometry data arrays in the memory subsystem 22. The command preprocessor 
30 converts the virtual memory pointers to physical memory addresses for performing the DMA reads to the 
memory subsystem 22 without intervention from the host processor 20. 

The command preprocessor 30 implements two data pipelines; a 3D geometry pipeline, and a direct port 
25 pipeline. 

In the direct port pipeline, the command preprocessor 30 receives direct port data over the host bus 28, 
and transfers the direct port data over a command-todraw bus (CD-BUS) 80 to the draw processors 50-54 as 
direct port packets. The direct port data is optionally processed by the command preprocessor 30 to perform 
X11 functions such as character writes, screen scrolls, and block moves in concert with the draw processors 

30 50-54. The direct port data may also include register writes to the draw processors 50-54, and individual pixel 
writes to the frame buffer 100. 

In the 3D geometry pipeline, the command preprocessor 30 accesses a stream of input vertex packets 
from the geometry data arrays, reorders the infonmation contained within the input vertex packets, and op- 
tionally deletes information in the input vertex packets. The command preprocessor 30 reorders the infonnation 

35 from the input vertex packet Into refomnatted vertex packets having a standardized element order. The conv 
mand preprocessor 30 then transfers output geometry packets over a command- to-floating-point bus (CF- 
BUS) 82 to one of the floating-point processors 40-43. The output geometry packets comprise the reformatted 
vertex packets with optional modifications and data substituttons. 

The command preprocessor 30 converts the infonmation in each input vertex packetfrom differing number 

40 formats into the 32 bit iEEE floating-point number format. The command preprocessor 30 converts 8 bit f bced- 
point numbers, 16 bit fixed-point numbers, and 32 bit or 64 bit IEEE floating-point numbers. 

The command preprocessor 30 either reformats or inserts headerf ields, inserts constants, and generates 
and inserts sequential pick-kl's, and optionally inserts constant sequential pick-id's. The command preproces- 
sor 30 examines the chaining bits of the header and reassembles the Information from the input vertex packets 

45 into the reformatted vertex packets containing completely isolated geometry primitives including points, lines 
and triangles. 

The command preprocessor 30 receives control and status signals from the floating-point processors 40- 
43 over a control portion of the CF_BUS 82. The control and status signals indicate the availability of input 
buffers within the floating-point processors 4043 for receiving the output geometry packets. 

50 The floating-point processors 40-43 are each substantially similar. Each floating-point processor 40-43 

implements a 32 bit micro-code driven floating-point core, along with parallel input and output packet commu- 
nication hardware. Each of the floating-point processors 40-43 implements floating-point functions including 
multiply, ALU, reciprocal, reciprocal-square-root and integer operations. Each floating-point processor 40-43 
implements a wide assortment of specialized graphics instructions and features. Each floating-point processor 

55 40-43 is optimized to implement the number of fast internal registers required to perform the largest common 
three dimensional graphics processing micro-code inner loop implemented by the graphics accelerator 24. 

For one embodiment, each floating-point processor 40-43 is implemented on a single integrated circuit 
chip. The only support chips required for each floating-point processor 40-43 is a set of four external SRAM 
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chips that provide an external micro-code in a control store (CS). 

Each floating-point processor 40-43 implements a function for setting up triangles for scan conversion by 
the draw processors 50-54. The first step of the setup function sorts the three vertices of a triangle In ascending 
y order. Each floating-point processors 40-43 broadcasts draw packets to all of the draw processors 50-54 
5 over the CD-BUS 80. The draw packets contain final geometry primitives, including triangles, points and lines. 
The draw processors 50-54 functton as VRAM control chips for the frame buffer 100. The draw processors 
50-54 concurrently render an image into the frame buffer 100 according to a draw packet received from one 
of the floating-point processors 40-43 or according to a direct port packet received from the command prepro- 
cessor 30. 

10 Each draw processor 50-54 performs the scan conversion functions of edgewalking function and scan in- 
terpolation. The replication of the edgewalking and scan interpolation functions anrrong the draw processors 
50-54 obviates the need for large scale communication pathways between separate edgewalking and scan 
interpolation processors, thereby minimizing the pin counts of each of the draw processors 50-54 and decreas- 
ing printed circuit board space requirements. 

15 The frame buffer 1 00 is arranged as a set of 5 VRAM interleave banks. The draw processor 50 writes pixel 

data into an interleave bank_0 61, the draw processor 51 writes pixel data into an interleave bank_1 62, the 
draw processor 52 writes pbcet data into an interleave bank_2 63, the draw processor 53 writes pbcel data into 
an Interleave bank_3 64, the draw processor 54 writes pbcel data into an interleave bank_4 65. 

Each draw processor 50-54 renders only the pixels visible within the corresponding interleave bank 61- 

20 65. The draw processors 50-54 concurrently render the triangle primitive defined by an draw packet to produce 
the correct combined rasterized image in the frame buffer 100. Each draw processor 50-54 rasterizes every 
fifth pixel along each scan line of the final rasterized image. Each draw processor 50-54 starts a scan line 
biased by 0, 1, 2, 3. or 4 pixel spaces to the right 

Each draw processor 50-54 optionally performs depth cueing. Each pbcel of a triangle, vector or dot ren- 

25 dered may be depth cued within the draw processors 50-54 without the performance penalty of prior graphics 
systems that perform depth cueing in floating-point processors. Each draw processor 50-54 optionally per- 
forms rectangular window clipping, blending and other pixel processing functions. 

The post-processor 70 receives interleaved pbcel data fronri the franrra buffer 100 over the video bus 84. 
The post-processor 70 perfonms color look-up table and cursor functions. The RAMDAC 72 converts the pixel 

30 data received from the post- processor 70 into video signals 73 for the display device 26. 

Figure 3 is a block diagram of the command preprocessor 30. The command preprocessor 30 is shown 
coupled to the host bus 28 for comnmjnication through the 3D geometry pipeline and the direct port pipeline. 
For one embodiment, the command preprocessor 30 is implemented on a single integrated circuit chip. 
The direct port pipeline comprises an input interface 541 and an X11 operattons circuit 551. The input in- 

35 terface 541 receives direct port data over the host bus 28, and transfers the direct port data over the CD-BUS 
80 to the draw processors 50-54. The direct port data includes register writes to the draw processors 50-54 
and individual pixel writes to the frame buffer 100. The direct port data is optionally transferred to the XII op- 
eratk>ns circuit 551 to perform XII functions such as character writes, screen scrolls, and block moves in con- 
cert with the draw processors 50-54. 

40 The 3D geometry pipeline comprises the input interface 541 , a bucket buffer 542, a fonmat converter 543, 

a vertex buffer comprising a set of vertex registers 549 and alternate tupple registers 540. Format conversion 
in the 3D geometry pipeline is controlled by a VCS operations circuit 545 and a converter sequencer 544. Out- 
put geometry packets are assembled by a primitive assembly circuit 547 and a sequencer 548. A 32-1 6 circuit 
550 optionally performs data compression A set of internal registers 552 are progranrvned over the host bus 

45 28 to control the operations of the 3D geometry pipeline and the direct port pipeline. A DMA controller 546 
performs DMA transfers into the bucket buffer 542 over the host bus 28. 

The input interfece 541 contains a burst buffer for interfacing between the differing clocking environments 
of the host bus 28 and the command preprocessor 30. The burst buffer functions as a set of temporary holding 
registers for input vertex packets transferred into the bucket buffer 542. 

50 The format converter circuit 543 accesses the input vertex packets from the bucket buffer 542. and as- 
sembles the reformatted vertex packets into the vertex registers 549. The format converter circuit 543 is con- 
trolled by the VCS operattons circuit 545 according to preprogrammed format conversion operations. The for- 
mat conversion is sequenced by the converter sequencer 544. 

The primitive assembly circuit 547 under control of the sequencer 548 accesses the refomiatted vertex 

55 packets from the vertex registers 549, and transfers the output geometry packets over the CF-BUS 82. The 
primitive assembly circuit 547 optionally substitutes alternate tupples from the alternate tupple registers 540. 
The primitive assembly circuit 547 also optionally performs data compression on data in the output geometry 
packets using the 32-16 circuit 550. 
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The format converter 543 processes input vertex packets that define a triangle strip. Header bits in each 
input vertex packet specify a replacement type. The replacement type defines the combination of a subsequent 
Input vertex packet with previous input vertex packets to form a next triangle in the triangle strip. The format 
converter 543 implements a register stack that holds the last three vertices in the triangle strip. The format 
5 converter 543 labels the last three vertrces in the triangle strip as the oldest, the middlest, and the newest. 

A triangle strip with a "zig-zag" pattern corresponds to a new input vertex packet having a header that spe- 
cifies the replacement type replace_oldest The replacement type replace^oldest causes the format converter 
543 to replace the oldest vertex by the middlest, and to replace the middlest vertex by the newest and to set 
the newest vertex to the vertex in the new input vertex packet. The foregoing pattern corresponds to a 
10 PHIGS_PLUS simple triangle strip. 

A triangle strip with a "star" pattern corresponds to a new input vertex packet having a header that specifies 
the replacement type repiacQ_middlest. The replacement type replacejviddlest causes the format converter 
543 to leave the oldest vertex unchanged, to replace the middlest vertex by the newest vertex, and to set the 
newest vertex to the vertex in the new input vertex packet 
15 To begin a generalized triangle strip, a new input vertex packet has a header that specifies the replacennent 
type rsstart. The replacement type rastart causes the format converter 543 to mark the oldest and the middlest 
vertices as invalid, and to set the newest vertex to the vertex in the new input vertex packet 

The primitive assembly circuit 547 transfers an output geometry packet for a triangle from the vertex reg- 
isters 549 and alternate tupple registers 540 over the CF-BUS 82 whenever a replacement operatktn generates 
20 three valid vertices in the vertex registers 549. 

The restart replacement type in the header of a Input vertex packet corresponds to a moye operatk}n for 
polylines. The restart replacement type enables a single data structure, the geometry data array in the menfx>ry 
subsystem 22, to specify multiple unconnected variable length triangle strips. Such a capability reduces the 
overhead required for starting a DMA sequence over the host bus 28. 
25 The replacement types in the input vertex packets received by the command preprocessor 30 from the 
geometry data array in the menr>ory subsystem enables a triangle strip to change from a "zig zag" pattern to 
a "star" pattern in the middle of the strip. Such a capability enables the representation of complex geometry 
in a compact data structure while requiring minimal input data bandwidth over the host bus 28. 

The format converter 543 rearranges the vertex order in the vertex registers 549 after every rapiacejoldest 
30 replacement type to normalize the facing of the output triangles in the reformatted vertex packets. The primitive 
assembly circuit 547 rearranges the vertex order as the vertex is transferred out of the vertex registers 549 
such that the front ^ce of the output triangle is always defined by a clockwise vertex order. 

A header bit in a input vertex packet specifies an initial face ordering of each triangle strip. In addition, 
the command preprocessor 30 contains a register with a state bit which causes reversal of the initial face or- 
35 dering specified in the header. An application program executing on the host processor 20 maintains the state 
bit to reflect a model matrix maintained by the application program. Also, the command preprocessor 30 re- 
verses the face ordering for every triangle in a "zig-zag" pattern. 

The primitive assembly circuit 547 transfers each reformatted vertex packet from the vertex registers 549 
to a next available floating-point processor 40-43. The next available floating-point processor 40-43 is deter- 
40 mined by sensing input buffer status of each floating-point processor 40-43 over a control portion of the CF- 
BUS 82. 

The command preprocessor 30 maintai ns a record or 'scoreboard" of the ordering of transfer of each output 
geometry packet to the floating-point processors 40^3. The connmand preprocessor 30 controls the output 
buffers of the floating-point processors 40-43 by transferring control signals over a control portion of the CD- 
45 BUS 80. The command preprocessor 30 ensures that the output geometry packets are processed through the 
floating-point processors 40-43 in the proper order when a sequential rendering order is required. If sequential 
rendering is not required, then the first draw packet at the output of the floating-point processors 40-43 is ren- 
dered first 

The format converter 543 also reformats polylines and poly-polylines. In addition, the format converter 543 
50 optionally converts triangle strip data into polyline edges. Such a capability reduces the complexity of the mi- 
cro-code for the floating-point processors 40-43 because triangle processing Is not mbced with line processing 
during operations that require triangle edge highlighting. 

To process edge highlighting of triangles within a triangle strip, the command preprocessor 30 assembles 
the Input vertex packets for the triangle strip into reformatted vertex packets, and passes the reformatted ver- 
55 tex packets to the floating-point processors 40-43 over the CF-BUS 82 as output geometry packets. There- 
after, the command preprocessor 30 accesses the original triangle strip input vertex-packets over the host bus 
28, and assembles the input vertex packets into refomnatted vertex packets containing isolated vectors reF>- 
resentlng highlighted edges. The command preprocessor 30 then processes the isolated vectors through the 
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floating-point processors 40-43 and the draw processors 50-54 to perform the highlighting function. 

For one embodiment the data portion of the CF-BUS 82 is 16 bits wide, and the data portion of the CD- 
BUS 80 is 16 bits wide. The command preprocessor 30 optionally compresses color and normal data compo- 
nents of the reformatted vertex packets using the 32-16 circuit 550 before transfer to the floating-point proo- 
5 essors 40-43 over the CF-BUS 82 as output geometry packets. The 32-16 circuit 550 compresses the color 
and nonmal data from 32 bit IEEE floating-point format into 16 fixed-point fonmat Thereafter, the floating-point 
processors 40-43 receive the output geometry packets with the compressed color and normal data compo- 
nents, and decompress the color and normal components back into 32 bit IEEE floating-point values. 

The compression of color and norma) data components of the refonnatted vertex packets does not sub- 
10 stantially affect the ultimate image quality for the graphics accelerator 24 because the color components of 
the reformatted vertex packets are represented as eight bit values in the frame buffer 100. Similarly, normal 
components of the output geometry packets having a 16 bit unsigned accuracy represent a resolution of ap- 
proximately plus or minus one inch at one mile. On the other hand, the data compresston of color and normal 
components of the reformatted vertex packets reduces the data transfer bandwidth over the CF-BUS 82 by 
15 approximately 25 percent 

Figure 4 is a block diagram of the floating-point processor section 45. which includes the floating-point 
processor 40 and a control store (CS) 149. The floating-point processor 40 is comprised of an input circuit 141, 
an output circuit 145, a registerf ile 142, a set of functional units 143, a control circuit 144. and a SRAM interface 
circuit 146. The floating-point processor 40 implements an internal subroutine stack and block load/store in- 
20 structions for transfers to the CS 149. as welt as integer functions. 

The floating-point processor 40 receh/es the output geometry packets over a data portk>n 181 of the CF- 
BUS 82. The command preprocessor 30 transfers control signals over a control portion 182 of the CF-BUS 
82 to enable and disable the input buffer 141, 

The function units 143 Implement a floating-point multiplier, a floating-point ALU. a floating-point recipro- 
25 cal operation, a reciprocal square-root operation, and an integer ALU. The output circuit 145 transfers draw 
packets over a data portion 183 of the CD-BUS 80. The output circuit 145 also transfers control signals over 
a control portion 1 84 of the CD-BUS 80 to synchronize data transfer to the draw processors 50-54 and to co- 
ordinate bus activity on the CD-BUS 80 with the command preprocessor 30. 

For one embodiment the input circuit 141 and the output circuit 145 each contain 64 registers for buffen'ng 
30 geometry data. The register file 142 is comprised of one hundred and sixty 32 bit registers. 

The SRAM interface 146 communicates with a control store (CS) 149 over a control store address bus 
147 in a control store data bus 148. For one embodiment the control store address bus 147 » 17 bits wide 
and the control store data bus 148 is 32 bits wide. The control store 149 is comprised of four 128k by eight bit 
SRAMs. 

35 The registers contained in the input circuit 1 41 are arranged as a pair of 32 register files in a double buffered 
fashion. Similarly, the registers contained in the output circuit 145 are arranged as a pair of 32 register double 
buffered register files. The micro-code executing on the floating-point processor 40 accesses the registers of 
the input circuit 141 and the output circuit 145 as special register files. The instruction set for the floating-point 
processors 40 includes commands for requesting and for relinquishing the register files, as well as commands 

40 for queuing for transmission completed data packets over the CD-BUS 80. 

The floating-point processors 40 implements the triangle setup f unctton for scan converston by the draw 
processors 50-54. The first stage of the triangle setup function sorts the three vertices of a triangle in ascend- 
ing y order. The floating-point processor 40 implements a special instruction that reorders a section of a register 
file 142 in hardware based upon the results of the last three comparisons of the y coordinates of the vertices. 

45 A dip testing function implemented in the floating-point processors 40 computes a vector of dip condition 
bits. The floating-point processor 40-43 implements a special dip test instruction that computes pairs of the 
clip condition bits, while shifting the clip condition bits into a special dip register. After the dip condition bits 
have been computed, special branch instructions decode the dip condition bits contained in the clip register 
into the appropriate dip condition. The floating-point processor 40 implements separate branch instructions 

so for dipping triangles and vectors. The special branch instructions enable testing of multiple dip conditions with- 
in the same instructk)n. 

Figure 5 is a block diagram of the draw processor 50. The draw processor 50 is comprised of a 3D geometry 
pipeline interface circuit 200, a rendering circuit 210, a direct port bus interface circuit 220, and a menrwry con- 
trol circuit 230. The draw processors 51-54 are each substantially similar to the draw processor 50. 
55 The rendering circuit 210 implements a high accuracy digital differential analyzer (DDA) algorithm that en- 
ables sub-pixel accuracy using thirty two bit internal processing units. Aliased and antl-aliased lines and dots 
are rendered in the distributed manner previously described, wherein the draw processor 50 processes every 
fifth pixel along a scan line. 
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The draw processor 50 also implements the rendering portions of the X11 operations in coordination with 
the X11 operations circuit of the command preprocessor 30. The X11 operations include reading and writing 
of groups of pixels for vertical scrolls, raster operations and stencil operations. 

The 3D geometry pipeline interface circuit 200 provides a double buffered anrangement for receiving output 

5 geometry packets over a data portion 242 of the CD-BUS 80. The 3D geometry pipeline interface circuit 200 
also transfers control signals over a draw load signal line 240 and a draw buffer available signal line 241 to 
coordinate data transfer with the command preprocessor 30 and the floating-point processors 40-43. The 3D 
geometry pipeline interface circuit 200 is arranged such that new geometry data is loaded over the CD-BUS 
80 while old geometry data is being rendered by the rendering circuit 210. 

10 The rendering circuit 21 0 performs the edgewalking function in one single pixel cycle time in order to pre- 
vent slowing of the scan conversion function. The high speed of the edgewalking function is provided because 
the edgewalking circuit must advance to a next scan line up to five times more often than would be required 
of a single external edgewalking chip. The rendering circuit 210 performs rasterization algorithms for triangles, 
anti-aliased vectors, aliased vectors, anti-altased dots, and aliased data. 

15 The memory control circuit 230 generates addresses over a frame buffer address bus 252 and control 
signals over a frame buffer control bus 250 to transfer pixel data over a frame buffer data bus 254 to the in- 
terleave bank_0 61 . 

The 3D geometry pipeline interface circuit 200 receives draw packets over the CD__BUS 80. The command 
type is encoded in the first word of each draw packet The command type determines the sequence and format 
20 of each word in the draw packet. The 3D geometry pipeline interface circuit 200 routes each word of a draw 
packet data into an appropriate internal double buffer register. After a complete draw packet is assembled in 
the double buffer registers, the 3D geometry pipeline interface circuit 200 initiates a handshake sequence to 
load the command words from the draw packet into a set of current buffer registers 212 in the rendering circuit 
210. 

25 The 3D geometry pipeline interface circuit 200 contains circuitry for adjusting all X coordinates of the draw 
packet according to a draw processor interleave value assigned to the draw processor 50. 

The direct port bus interface circuit 220 receives direct port packets over the CD-BUS 80. The direct port 
bus interface circuit 220 controls the execution of direct port comnoands. The direct port bus interface circuit 
220 also controls the direct port handshake with the command preprocessor 30 according to a direct port strobe 

30 signal 244 and a direct port buffer available signal 245. The direct port bus interface circuit 220 assembles the 
direct port command and controls a handshake sequence that causes the necessary frame buffer access by 
the menK)ry control circuit 230. Read data destined for the command preprocessor 30 is placed in the a read 
buffer in the direct port bus interface drcuit 220, which is read by the command preprocessor 30 over the CD- 
BUS 80. 

55 The rendering circuit 210 performs edge walking and span interpolation functtons for triangles, performs 
a simple DDAf unction for vectors, and perfonns a pass operation for dots. The rendering circuit 21 0 also per- 
forms end point correction, antialiasing alpha calculation, and computation of depth cue scale factors. The ren- 
dering circuit 21 0 generates (x,y), (r.g.b,2), and (alpha) values for each pixel rendered to the interleave bank_0 
61. 

40 The rendering circuit 210 stores the (x,y), (r,g,b,z). and (alpha) values into a set of double buffer registere 
214. The rendering circuit 210 then controls a handshake sequence to load the values into an appropriate set 
of double buffer registers in the memory control circuit 230. 

The menmry control circuit 230 receives requests for frame buffer access from the rendering circuit 210, 
the direct port bus interface circuit 220, and vWeo/DRAM refresh circuitry (not shown). The merrwry control 

45 circuit 230 arbitrates among the requests, and generates the necessary control signals to read/write pixels to 
the VRAM interleave bank_0. 

The memory control circuit 230 also perfonms address and data related functions. The address related 
functions include address translation, viewport clipping and page mode access detection. The data related 
functions include blending and logical operatbns on data, z buffering, window ID checking, and screen door 

50 transparency, etc. 

The memory control circuit 230 also includes a video refresh counter for transfer VRAM cycles. The menr>- 
ory control circuit 230 performs DRAM refresh using CAS before RAS cycles. 

Figure 6 illustrates the 3D geonnetry pipeline interface circuit 200. The 3D geometry pipeline interface 
circuit 200 receives draw packets for the 3D geometry pipeline over the CD.BUS 80. The 3D geometry pipeline 
55 interface circuit 200 unpacks the draw packets into a set of double buffer registers 270 -272. The double buffer 
registers 270-272 comprise 24 double buffer registere (dbrO - dbr23). The 3D geometry pipeline interface circuit 
200 comprises a DDA bus control circuit 202, and a DDA data path circuit comprising a draw interleave circuit 
204, an adder 206, and a multiplexer 208. 
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The adder 206 adjusts alt x coordinates in the draw packets according to the draw processor interleave 
value assigned to the draw processor 50, The adder 206 receives draw packets for the 3D geometry pipeline 
over the CD_BUS 80. The draw interleave circuit 204 contains an interleave register that stores a draw proc- 
essor interleave value assigned to the draw processor 50. The draw Interleave circuit 204 generates a value 
5 equal to five minus the draw processor interleave value stored in the interleave register. The adder 206 adds 
the value generated by the draw interleave circuit 204 to all x coordinates in the draw packets before transfer 
to the double buffer registers 270 - 272. 

The DDA bus control circuit 202 generates a set of control signals 280 to toad the draw packets into the 
double buffer registers 270-272. The first word (header) of each draw packet identifies the packet type, the 
10 format of the packet, the length of the packet The DDA bus control circuit 202 loads the header into the first 
double buffer register (dbrO). Based on the header Information, the DDA bus control circuit 202 loads the re- 
mainder of the double buffer registers (dbr1-dbr23). 

Figure 7 lists the format of the 3D geometry pipeline commands contained in draw packets. The double 
buffer register (dbr1-dbr23) employed for each parameter In the commands packet is also shown. 
15 The DDA bus control circuit 202 handshakes with the DDA circuit 210 to ensure that the commands are 
transferred to the current buffer registers 21 2 in an orderly fashion. The DDA bus control circuit 202 generates 
a load current buffer signal 281 to simultaneously transfer the contents of the double buffer registers 270-272 
to the corresponding current buffer registers 21 2. The DDA bus control circuit 202 then asserts the draw buffer 
available signal 241 to enable transferof another output geometry packet into the double buffer registers 270- 
20 272. A DDA ready signal 282 is received from the DDA circuit 210 after the last command is completed. 

The direct port bus interface circuit 220 in conjunction with the memory control circuit 230 executes the 
commands received through the direct port pipeline over the CD-BUS 80. The direct port bus interface circuit 
220 comprises a data path circuit 222 and a control circuit 224. 

The data path circuit 222 routes '/ and 'x' addresses received over the CD-BUS 80 into the memory con- 
as trol circuit 230. For example, the data path circuit 222 routes y and 'x' addresses from the CD-BUS 80 to the 
memory control circuit 230 during pixel read/write direct port commands. During block copy, vertical scroll, and 
fill direct port commands, the pixel address is generated by the draw processor 50 as described below. 

The data path circuit 222 contains a set of instr source registers and instr destination registers. The Instr 
source registers and instr destination registers are employed during the Bit, the vertical scroti, and the fill cat- 
30 egory direct port commands. The instr source registers comprise an *xs' and a 'ys,' and the instr destination 
registers comprise an 'xd' a 'yd' and a 'countd' register. The xs, xd, ys. and yd registers are auto-increment or 
auto decrement as specified by a corresponding direction bit 

During the read-X-write Bit direct port command, the draw processor 50 performs a read access of the 
frame buffer 100 at (xs, ys) and then performs a write access at (xd. yd). 
35 During a stencil operation through the direct port pipeline, the data path circuit 222 extracts a single bit 
from the data field according to the corresponding interleave number for the draw processor 50. The data path 
circuit 222 then adds the extracted bit to the mask bit. 

The data path circuit 222 also performs data formatting for pixel 'byte write' access mode and the byte 
mode block copy direct port command. The data path circuit 222 also contains a double buffered write buffer 
40 for block copy direct port operations, and contains logic for byte extraction during byte nfK)de block copy direct 
port operations. 

The control circuit 224 decomposes the incoming direct port packets into pixel read, pixel write, and load 
register instructions for the memory control circuit 230. The control circuit 224 also implements a handshake 
sequence with the command preprocessor 30 to transfer direct port commands/data between the command 
45 preprocessor 30 and the draw processor 50. 

A pixel write direct port command writes data to one of the three plane groups: image, depth, or window. 
The pixel write operation is perfonmed according to either a state register set 0 or a state register set 1 as spe- 
cified by a state set bit in the direct port packet header. Similarly, a pixel read direct port command reads data 
to one of the three plane groups: image, depth, or window. The pbcel read operation is performed according 
50 to either the state register set 0 or the state register set 1 as specified by a state set bit in the direct port packet 
header. 

During a stencil write direct port operation, the command preprocessor 30 extracts the data and mask bits 
from the appropriate bits in the direct port input packet If the mask bit Is '0', then the pbcel Is not modified. If 
the mask bit is 1, then the pixel is written with the foreground color. If the draw processor 50 is in OR mode 
55 and the mask bit is 0, then the pbcel is unmodified. If the draw processor SO is in AND mode and the mask bit 
is 0, then the pixel is written with the background color The stencil operation is performed according to the 
state register set 0. 

The fill direct port command causes the draw processor 50 to transfer of up to 4 pixels starting at (xd. yd) 
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to the frame buffer 100. The data source for the fill direct port command is a stencll/fill foreground register. 
The frame buffer 100 is accessed with one random and three page mode read-modify-write cydes. 

Direct port block copy operations are performed with separate instructions: the Wt^Read. blt_read_trans- 
mit, and the blt-write-transm it-read instructions. In byte mode, four pixels are read, written and transmitted 
5 with each command, otherwise, one pixel is read, written and transmitted with each command. 

The bit-read direct port instruction causes the draw processor 50 to read a pixel from (xs.ys) and store 
the pixel in a block copy read buffer. The xs register is then incremented and the x-source-size register is de- 
cremented. 

The blt-read-transmit direct port instruction causes the draw processor 50 to read a pixel from (xs,ys) and 
10 store the pixel in a block copy read double buffer. The command preprocessor 30 reads the pbcel from the block 
copy read buffer. Thereafter, the pbcel data is moved from the read double buffer to the read buffer. The xs 
register is then incremented and the size register is decremented. 

The bit-write-transmit-read direct port instruction causes the draw processor 50 to copy the data from the 
write buffer into the write double buffer, and then write the data to the frame buffer 100 at (xd, yd). The xd 
15 register is then incremented and the x-desttnation-size register is decremented. The draw processor 50 then 
reads a pixel from the frame buffer 100 at (xs,ys) and transfer the pbcel to the block copy read double buffer. 
The command preprocessor 30 reads the pbcel from the block copy read buffer. Thereafter, the pbcel data is 
moved from the block copy read double buffer to the block copy read buffer. The xs register is then incremented 
and the size register Is decremented. 
20 The nnemcry control circuit 230 section performs viewport clipping, logical ops, blending ops, alpha cal- 
culation, Z-Buffering, window clipping, screen door transparency, cursor plane access, picking, video refresh, 
DRAM refresh, fast dear plane management and VRAM control. 

The memory control circuit 230 viewport dips memory accesses to the frame buffer 100 for both the 3D 
geometry pipeline and the direct port pipeline. 
25 When a pick hit is detected, the draw processor 50 freezes the 3D geometry pipeline after rendering the 
primitive that caused the pick hit The pick ID registers are preserved. The draw processor 50 asserts a draw 
processor interrupt to the command preprocessor 30 when pick hit is detected. 

The host processor 20 programs a set of contrd registera in the draw processors 50. The control registera 
hold operating parameters for both the 3D geometry pipeline and the direct port pipeline. The command pre- 
30 processor 30 receives direct port data over the host bus 28 targeted for the control registers of the draw proc- 
essor 50, and transfera the direct port data over CD-BUS 80 to the draw processor 50 as direct port packets. 

The control registers contained in the draw processor 50 are separated into gtobal and state set dependent 
registera for the direct port pipeline (state set 0) and the 3D geometry pipeline (state set 1). For some opera- 
tions, a separate register is provided in both state set 0 and 1 . The global registers have one register that serves 
35 both state sets and are accessible over both the direct port and the 3D geometry pipeline. 

The host processor 28 broadcasts writes to all of the draw processors 50-54, and writes to the spedfic 
draw processor 50-54. The broadcast write is employed to program a constant attribute for all of the draw proc- 
essors 50-54. The host processor 28 also reads the control registera of the draw processors 50-54. 

The control registera for the draw processor 50 include a control and status register (CSR). The CSR is 
40 a read only register over the direct port The bit fields for the CSR are as follows: 
D<7> = Reset 3D Geometry Port 

D<6:4> - Number of PID'S - indicates the number of pick ID'S in the draw processor pipeline. 
D<3> = Pick Hit - indicates that the draw processor 50 detected a pixel in the pick aperture. The Pick Hit field 
is reset by the command preprocessor 30 over the direct port 
45 D<2> = Semaphore - indicates that the draw processor 50 semaphore is set 

D<1> = Stall Acc - indicates that a Stall Accelerator signal was received from the command preprocessor 30. 
D<0> = Acc Stalled - indicates that an Accelerator Stalled signal is being sent to the command preprocessor 
30. 

The control registera for the draw processor 50 indude the draw processor interleave register which spe- 
50 cif ies the draw processor interleave value. The draw processor interleave value identifies the draw processor 
50 and is in the range of 0 to 4. 

A write to a semaphore register sets a draw processor semaphore and sets bit 2 of the CSR. The draw 
processor 50 does not execute 3D geometry pipeline commands while the semaphore bit is set The draw proc- 
essor semaphore register is strobed over both the 3D geometry pipeline and the direct port pipeline. A write 
55 to a dear semaphore register deara the draw processor semaphore and deare bit 2 of the CSR. The draw 
processor semaphore register is strobed over the direct port pipeline. 

A write to a set stall register sets bit 1 of the CSR. The draw processor 50 does not execute any 3D ge- 
ometry pipeline commands while bit 1 of the CSR is set The command preprocessor 30 uses bit 1 of the CSR 
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to lock out the 3D geometry pipeline during execution of Bit or fill direct port operations. A write to a dear stall 
register clears bit 1 of the CSR to enable the draw processor 50 to resume updates of the frame buffer 100. 

A write to a reset geometry port register resets the 3D geometry pipeline to the draw processor 50 by set- 
ting bit 7 of the CSR (reset), and by resetting bit 3 (pick hit), bit 2 (semaphore), and bit 0 (stall). A write to a 
5 clear geometry port reset register cleare 3D geometry pipeline reset condition and cleara bit 7 of the CSR. 

A frame buffer width register selects the maximum horizontal resolution of the frame buffer 1 00, and en- 
ables stereo addressing mode. 

A draw processor attribute register controls several functions of the draw processor 50. For a blend op- 
eration (bits 24 and 23) or raster operations (ROP) (bits 21 through 18), the state set 0 specifies the current 
10 ROP. The cun-ent ROP only affects the RGBO planes, depending on the current access mode plane group 
selected by the plane group enable field (bits 11 through 5). The state set 1 specifies the current boolean ROP 
and enables the blend circuitry. The current ROP affects the RGBO, the cursor data, and the cursor enable 
planes. The current blend operation affects the RGB planes. 

The bit fields of the draw processor attribute register are defined below, wherein SRC is the source data, 
15 DST is the data at frame buffer destination, BG is the background, and DDA is digital differential analyzer 
D<31:29> = Pick Control 
1xx Enable picking 
x1x Render while picking 
xxl 3-D Pick Aperture 
20 The codes enable the following functions: 
Oxx Disable picking 

1 00 2-D Bound Pick without Render 

1 01 3-D Bound Pick without Render 
110 2-D Bound Pick while Render 

25 111 3-D Bound Pick while Render 

The 2D pick ^ses X and Y bounds. The 3D pick uses X, Y, and Z bounds. 
D<28> = Depth Cue Enable 

D<27> = Screen Door Enable enables the screen door transparency function: 
0 Solid - draw all pbcels 
30 1 Transparent ~ use screen door pattern 
D<28> = Force Color Enable 

0 SRC = DDA color 

1 SRC = Color register 
D<25> = Antialias Enable 

35 0 Constant alpha 

1 . Antialias filter alpha 
D<24> - Blend Function Select 

0 Blend to background: (SRC - BG) * alpha + DST 

1 Blend to frame buffer (SRC - DST)* alpha + DST 
40 D<23> = Blend Enable 

0 Perform a raster operation as specified by ROP Code in bits 21-18. 

1 , Perform a blend operation as specified by Antialias Enable in bit 25 and Blend Function Select in bit 

24. 

D<22> = BLT Source Buffer B 
45 0 . Disable Buffer B (enable Buffer A) 
1 Enable Buffer B (disable Buffer A) 
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B DST = -SRC or DST 

C DST = SRC; this is the default case when not doing a rop operation 

D DST = SRC or -DST 

E DST = SRC or DST 
5 F DST = all bits one 

D<17:14> = Z-Buffering Control - controls the use of the depth planes as follows 

1xxx HSR (Hidden Surface Removal) enable 

x1xx Z write enable 

xx1 X Constant Z enable 
10 xxx1 WID extension clip enable 

Figure 8 shows the coding of the Z-Buffering control field for hidden surface renx^val (HSR) functions using 

Z as a depth buffer Figure 9 shows the coding of the Z-Buffering control field for window ID (WID) functions 

using Z as a window ID extension. 

D<13:12> = Stereo Control: Bit 12 defines the current window application as stereo (1) or "nrKmo" (0). Bit 13 
15 specifies the desired half of the frame buffer (0 for left or 1 for right). 
xO Mono 
x1 Stereo 
Ox Left 
1x Right 

20 D<11:5> = Plane Group Enable - enables and disables individual plane groups. A pixel in a particular plane is 

updated only if that plane group is enabled and the write mask is "1". The function is modified for fast dear 

windows. Each bit (0 = disable, 1 = enable) controls a specific plane as follows: 

Bit 11 Window ID group enable 

Bit 10 Fast Gear operation enable (see bits 3:1 ) 
25 Bit 9 Red plane enable 

Bit 8 Green plane enable 

Bit 7 Blue plane enable 

Bit 6 Overlay plane enable 

Bit 5 Z plane enable 
30 D<4> = Force Current WID: 

0 Do not force Current Window ID. 

1 The contents of the Current Window ID plane replaces the contents of the frame buffer on every write. 
The final write to the WID planes is controlled by the plane mask and the window plane group enable 
bit 

35 D<3:1> = Fast Clear Plane Select - identifies test clear planes for the current window 

0 Fast Gear Plane 0 

1 Fast Clear Plane 1 

2 Fast Gear Plane 2 

3 Fast Gear Plane 3 
40 4 Fast Clear Plane 4 

5 Fast Clear Plane 5 

6 , Fast Clear Plane 6 

7 Fast Gear Plane 7 

D<0> = Buffer B Select - provides double buffer control for the RGB planes. Buffer B Select is set to 1 to select 
45 Buffer B as the target for all read, write and read-n^odify-write accesses of the frame buffer 1 00. For the Block 
Copy operation, bit 22 (Bit Source Buffer B) provides an independent selection of the source buffer to enable 
block copy operations between Buffer A and Buffer B. Buffer B Select is coded as follows: 

0 Select buffer A; if bit 22 is set, select buffer B as source. 

1 Select buffer B; If bit 22 is dear, select buffer A as source. 

50 A stencil/fill foreground color register specifies the frame buffer pixel data during a stencil mode access. 
The stencil bits that are set to 1 use the foreground color. During a fill operation, the data in the stencil/fill fore- 
ground color register is written to the frame buffer 100 for every pixel. The format of the data in the stencil/fill 
foreground color register depends on the selected plane group as shown above. For image planes, the bitfields 
of the stencil/fill foreground color register are as follows: 

55 D<31 :24> = Overlay - specifies the overlay pixel value; 
D<23:16> = Blue - specifies the blue pbcel value. 
D<15:8> = Green - specifies the green pixel value. 
D<7:0> = Red - specifies the red pixel value. 
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For depth planes, the bit fields of the stencil/fill foreground color register are as follows: 
D<23:0> = Depth - specifies the depth (Z) of the pixel. 

For window planes, the bit fields of the stencil/fill foreground color register are as follows: 
D<15:0> = Fast Clear - each bit specifies a fast dear plane: bit 10 specifies plane 0. bit 11 specifies plane 1. 
5 and so forth. 

D<9:6> = Overlay Window ID Planes - specifies the Overlay Window ID plane. 

D<5:0> = Image Window ID Planes - specifies the Image Window ID plane. 

A stencil background color register specifies the frame buffer pixel data during a stencil mode access. The 

stencil bits set to 0 user the background color if the transparency flag is 0. For image planes, the bit fields of 
10 stencil background color register are as follows: 

D<31 :24> = Overlay - specifies the overlay pixel value. 

D<23:16> = Blue - specifies the blue pixel value. 

D<15:8> = Green -specifies the green pixel value. 

D<7:0> = Red -specifies the red pixel value. 
15 For depth planes, the bit fields of stencil background color register are as follows: 

D<23:0> = Depth -specifies the depth (Z) of the pixel. 

For window planes, the bit fields of stencil background color register are as follows: 
D<15:0> = Fast Clear -specifies the fast dear plane. 

D<9:6> = Overlay Window ID Planes -specifies the Overlay Window ID plane. 
20 D<5:0> = Image Window ID Planes -specifies the Image Window ID plane. 

The block copy function of the draw processor 50 copies a source rectangle to a destination rectangle. 
The command preprocessor 30 programs the block copy function into the state set 0 registers that control pixel 
access as well as a copy/scroll source address register, a copy/scroil/f ill size register, and a copy/scroll/f ill des- 
tinatbn address register. 

25 The copy/scroll source address register specifies the initial frame buffer source address for a rectangle 
copy operation. The bit fields are defined below: 

D<31:30> = Source Group - spedfies the copy source plane group as follows. 

000 Image plane group 

001 Depth plane group 
30 010 Window plane group 

1 00 Image plane group: overlay 

101 Image plane group: blue 

1 1 0 Image plane group: green 

111 Image plane group: red 

35 D<25:1 6> = Source Y - specifies the Y source address (1 0 bits). Valid values are in the range 0 through 1 023, 
D<7:0> = Source X - specifies the X source address (8 bits). Valid values are In the range 0 through 255. The 
X source address is equal to the integer value of the (actual X source address)/5. 
Copy/Scroll/Fill Size 

The copy/screll/fill size register specifies the size for a rectangle copy or fill operatk)n, and the direction 
40 for a copy operation. The bit fields are defined below: 

D<31> = Copy Direction - specifies the copy direction as follows: 

0 , Outer loop: top to bottom, inner loop: left to right, start at upper left 

1 Outer loop; bottom to top. inner loop: right to left, start at lower right. For fills, set to 0. 

D<7:0> = Size - spedfies the x size (8 bits) of the block. Valid values are in the range 0 through 255. The x 
45 size is equal to the integer value of the (actual xsize)/5. 

The copy/scroll/fill destinatk>n address register specifies the initial frame buffer destinatk)n address for a 
rectangle copy operation. The bitfields are defined below: 
D<31> = Byte Mode - specifies Byte or Pbcel mode. Pixel mode is used for fills. 

D<30:29> ~ Destination Group spedfies the destination plane group for pbcel mode. Byte mode always uses 
50 the image plane group. 

00 Image plane group 

01 Depth plane group 

10 Window plane group 

11 Image plane group + Depth plane group 

55 D<28> = Ext - specifies the destination width extension. The destination width is 1 less than the size. 

D<25:16> = Destination Y- specifies the Y destination address (10 bits). Valid value^are in the range 0 through 
1023. 

D<7:0> = Destination X - specifies the X destination address (8 bits). Valid values are in the range 0 through 
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255. The X destination address is equal to the integer value of the (actual X destination address)/5. 

Viewport clipping in the draw processor 50 is controlled by a view dip minimum bound register and a view 
clip maximum bound register. The view dip minimum bound register specifies the viewport top and left (indu- 
sive) boundaries which are the coordinates of the top left conrwr. The draw processor 50 clips a pixel if Y < top 
5 or if X <left wherein Y=0 is the top of the screen and X=0 is the left side of the screen. The bit fields of the 
view dip minimum bound register are defined below: 

D<25:16> = Top Boundary - specifies the top boundary of the dip area. The Y values are 10 bits. 
D<10:0> = Left Boundary - specifies the left boundary of the dip area. 

The view dip maximum bound register specifies the viewport bottom and right (inclusive) boundaries which 
10 are the coordinates of the bottom right corner. The draw processor 50 dips the pixel if X > right or if Y > bottom, 
wherein Y=0 is the top of the screen and X=0 is the left side of the screen. The bit fields of the view dip minimum 
bound register are defined below: 

D<25:16> = Bottom Boundary - specifies the bottom boundary of the dip area. 
D<10:0> = Right Boundary - specifies the right boundary of the clip area. 

15 The draw processor 50 contains registers that control pick functions. The pick registers indude a dear 
pick hit register, a set of pick ID registers, a pick minimum bound register, a pick maximum bound register, a 
pick front bound register, and a pick bade bound register. A write to the dear pick hit register resets the pick 
hit output flag and enables the draw processor 50 to update the pick ID (PID) registere. 

The PID registers set the current Pick IDs for the draw processor 50. The PID registers comprise a set of 

20 five 32-bit registers. The PID registers are programmed through the 3D geometry pipeline. If picking is enabled, 
the draw processor 50 ensures that the PID registers are not updated once a pick-hit is detected. The draw 
processor 50 also ensures that the PID registers are not updated until all pixels for a prior draw comnr^nd are 
rendered without picking. The host processor 20 cannot update the PIDs via the direct pipeline during picking, 
but can render pixels that are subject to pk^king via the direct pipeline. 

25 The pick minimum bound register specifies the pick aperture bottom and right (indusive) boundaries which 
are the coordinates of the bottom right corner. The draw processor 50 does not generate a pick hit if X > right 
or if Y > bottom, wherein Y = 0 is the top of the screen and X=0 is the left side of the screen. The bitfields are 
defined below: 

D<25:16> = Bottom Boundary - specifies the bottom boundary of the pick aperture. 

30 D<10:0> = Right Boundary - specifies the right boundary of the pick aperture. 

The pick front bound register specifies the pick aperture front (indusive) boundary. The draw processor 
50 does not generate a pick hit if Z < front and the Pick CSR 3-D field is set The pick back bound register 
specifies the pick aperture back (indusive) boundary. The draw processor 50 does not generate a pick hit if 
Z > back and the Pick CSR 3-D field is set 

35 The screen door transparency feature for the draw processor 50 Is specified by the contents of sbrteen 
screen door column registers. Each screen door column register is 1 6 bits wide, yielding a 1 6 x 1 6 screen door 
transparency pattern. Each screen door column register defines one 16 pwel column of the screen door trans- 
parency pattern. If the pattern bit is one. the object is solid (visible) at the corresponding pbcel. If the pattern 
bit is zero, the object is transparent and the corresponding pbcel is not drawn. The column number Is equal to 

40 2 X nn for even columns and 2 x nn + 1 for odd columns, where nn (the register address offset) ranges from 0 
to 7 decimal. The bit fields for the screen door column registere are defined below: 
D<31:16> =: Column Odd Rows 15 through 0 are specified by bits 31 through 16. 
D<15:0> = Column Even Rows 15 through 0 are specified by bits 15 through 0. 

The fast clear operation in the draw processor 50 is controlled by a fast dear data register and a window 

45 background color register. The contents of the fast dear data register are written to the fast dear planes during 
VRAM flash write memory cydes to the frame buffer 1 00. The fast dear plane mask spedfies the bits written. 
The bit fields for the fast dear data register are shown below: 

D<15:10> = Fast Clear - a 6-blt field written to the ^st dear planes during VRAM flash write memory cydes. 
The window background color register specifies the window background color (RGBO) used in operations 
50 performed by the draw processor 50. A background of all Vs is used for the Z plane. If the window is a fast 

clear window, the background value is substituted for the RGBO data read from any invalkJ (i.e. fast deared 

but not yet written) pixels during read or read/modify/write cydes, for example, ROP or antialiasing operations. 

The window background color Is also available as one of the addend sources to the blend function of the draw 

processor 50. The bitfields for the window background color register are shown below: 
55 D<31 :24> = Overlay - specifies the overlay pixel value. 

D<23:16> = Blue - specifies the blue pixel value. 

D<15:8> = Green - specifies the green pixel value. 

D<7:0> = Red - specifies the red pbcel value. 
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The draw processor 50 contains a current window ID register and a window ID (WID) clip mask register. 
The current window ID register specifies the current window ID code. The current window ID code is forced 
into the WID planes if the force current WID attribute Is enabled (bit 4 of the draw processor attribute register). 
5 The current window ID code Is connpared to unmasked WID planes If WID dipping is enabled. 

The WID clip mask register specifies the current WID clip mask. Setting the bits in the WID dip mask reg- 
ister enables the corresponding bits In the cunrent WID register during the WID dip compare. 

The draw processor 50 implements 24 frame bufferZ planes that function as a depth buffer for 3-D hidden 
surface removal (HSR). A constant Z source register specifies a constant that can be written to the depth (Z) 
10 planes as a WID extension for window dipping or for parallel pixel mode writes. 

The draw processor 50 contains an image write mask register and a window write mask register. The image 
write mask register provides a per plane write mask for RGBO planes. The draw processor 50 replicates bytes 
in all four bytes of the Image plane group. The image write mask register to determine the byte written to the 
frame buffer. The bit fields of the image write mask register are shown below: 
15 D<31 :24> ~ Overlay Mask - specifies the overlay pixel mask. 
D<23:16> - Blue Mask - specifies the blue pixel mask. 
D<15:8> = Green Mask - specifies the green pbcel mask. 
D<7:0> = Red Mask - specifies the red pixel mask. 

The window write mask register provides a per plane write mask for fast dear, overlay window, and image 
20 window planes. The bit fields are shown below: 

D<15:10> = Fast Clear Mask - specifies the Fast Clear mask. 

D<9:6> = Overlay Window Plane Mask - spedf ies the Overlay Window Plane mask. 

D<5:0> = Image Window Plane Mask - spedf ies the Image Window Plane mask. 

The draw processor 50 contains a constant alpha source register and a force color register The constant 
25 alpha source register specifies a constant alpha source that may be substituted for the antlallas filter alpha. 
The force color register spedf ies a constant color source that may be substituted for the color values generated 
by the DAA unit. The bitfields of the force color register specify red, green, and blue pixel values. 

The depth cueing operatksns of the draw processor 50 are controlled by a depth cue Z-front register, a 
depth cue Z-back register, a depth cue scale register, a depth cue Z-scale register, and a depth cue fade color 
30 register. The depth cue Z-front register specifies the Z-front value for use in depth cueing. The depth cue Z- 
back register specifies the Z-back value for use in depth cueing. 

The depth cue scale register specifies the front and back scale factors for use in depth cueing. The scale 
values comprise 9 bits in the range 0 to 1.0. The depth cue Z-scale register spedf ies the Z-scale factor for 
use in depth cueing. The Z-scale factor consists of an 9-bit mantissa and a 6-bit exponent The depth cue fade 
35 color register specifies the red, green, and blue fade cdor for use In depth cueing. Phigs+ depth cueing is inr>- 
plemented ^s follows: 

C = S CI + (1 - S) Cd 

Where: 

C = Component of depth cued color 
40 Ci = Component of the input color 

Cd = Component of the depth cue fade color 

S > = Depth cue scale factor that is computed in the drew processor 50 as follows: 
If Z Is In front of Z-front, then S = front scale 
If Z Is behind Z-back, then S - back scale 
45 • If Z is between Z-front, and Z-back then S = back scale + (Z - Zback) ' 2^cale 

Figures 10-13 illustrate the pixel operations of the drew processor 50, and show the conditions tested 
by the draw processor 50 prior to pbcel writes to the fran^ buffer 100. Figure 10 illustrates the pixel operations 
for the fast dear planes. Figure 11 Illustrates the pixel operatbns for the window ID planes. Figure 12 illus- 
trates the pixel operations for the image (ORGS) planes. Figure 12 Illustrates the pbcel operatk>ns for the depth 
50 planes. The def initk)ns of the conditions shown in the columns of the Figures 10 • 13 are set forth below. 

The ptx In viewport condition if a "1" indicates that the pbcel is in the viewport defined by the view dip mini- 
mum bound and the view clip maximum bound registers. 

The pick without render condition if a "1" indicates a pick control code (i.e. bits 31 :29 of the draw processor 
attribute register) of lOx, which specifies enable picking and no rendering while pteking. 
S5 The WID match condition indicates that for each bit that is 1 in the WID clip mask register, the bit In fb data 
in equals the corresponding bit in the current window ID register, wherein fb data Tn Is the unnnodif led pixel 
data. 

For the HSR win condition, H Is bit 17 (hidden surface removal enable) of the draw processor attribute 
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register, and X is bit 14 (WID extension dip enable) of the draw processor attribute register, 
ForX = 0, H = 0: hsr.win = 1 

For X = 0. H = 1 : if (Znew ,= Zoid) hsr.win = 1 ; else hsr.win = 0 

For X = 1 , H = 0: if (Znew == Zold) hsr.win == 1 ; else hsr.win = 0 
5 For X = 1 , H = 1 : if (Znew == Zold) hsr.win == 1 ; else hsr.win = 0 

The screen door write enable condition if a '*0' indicates that bit 27 (screen door enable of the draw proc- 
essor attribute register) is a "1" and the selected saeen door bit is '0'. 

The fast clear enable condition is a bit 10 (fast clear enable) of the draw processor attribute register. The 
fast dear bit if a "1" indicates that the I and D planes are valid. 
10 The access mode conditions are as follows: 
I = Image (ORGB) planes, Direct Port write or read 
D = Depth (Z) plane, Direct Port write or read 
W = Window (FC + WID) planes, Direct Port write or read 
ID = Image + Depth, 3D write only or Direct Port write only 
IS The definitions for the data out to frame buffer are as follows. The element "Bit M (m, Side 1 , Side 0)* means 
that for each bit in mask m, if m = 0 select side 0 bit, and if m = 1 select side 1 bit "FB Data In" means that 
data written to the frame buffer 100 is the same as data read in from the frame buffer 100, i.e. the pixel is 
unmodified. 

For example, the first line of Figure 12 indicates that for a window access, data is never written out to the 
20 ORGB planes frame buffer of the frame buffer 1 00 (Data Out to FB = FB Data In). The second tine of Figure 
12 indicates that (for an I, D, or ID access) if the pixel is not in the viewport, the pbcel is not written (Data Out 
to FB = FB Data In). The eighth line of Figure 12 indicates that for an I or ID pbcel access, if all the conditions 
to the left of the vertical line are met, the data read in from the frame buffer 100 is passed to the ROP and 
blend units unchanged and data written out to the frame buffer 100 is the output of the ROP/Bland Unit (under 
25 control of the write mask). 

In the forgoing specification the invention has been described with reference to specific exemplary em- 
bodiments thereof it will, however, be evident that various modifications and changes may be made thereto 
without departing from the broader spirit and scope of the invention as set forth in the appended daims. The 
specification arid drawings are accordingly to be regarded as illustrative rather than restrictive. 



Claims 

1 . A draw processor for a graphics accelerator, comprising: 

35 geometry pipeline interface circuit receiving a draw packet over a draw bus from a floating-point 

processor, the draw packet containing a set of geometry parameters that define a geonnetry object in- 
duding high level screen space descripttons of two dimensional and three dimensksnal point line and area 
graphics primitives, the geometry pipeline interface circuit adjusting the geometry parameters according 
to an interleave value corresponding the draw processor; 

40 rendering circuit receiving the geometry parameters from the geometry pipeline interface circuit, 

the rendering circuit generating a set of pixels corresponding to the geometry object by performing edge- 
walking and scan interpolation functions according to the geometry paranneters; 

direct port interface drcuit receiving a direct port packet over the draw bus from a command pre- 
processor, the direct port packet containing a set of pixel function parameters that control at least one 

45 .pbcel function of the draw processor; 

mennory control drcuit receiving the pixels from the direct port interface circuit and receiving the 
pwel function parameters from the direct port interface circuit, the menrKwy control circuit writing the pbcels 
into a frame buffer memory while performing the pixel function. 

50 2. The draw processor of daim 1 , wherein the draw packet comprises at least one x coordinate for the ge- 
ometry object, and wherein the geometry pipeline interface circuit adds the interleave value the x coor- 
dinates for the geometry object. 

3. The draw processor of claim 1, wherein the pixel function comprises a pbcel depth cue function, and the 
55 pixel function parameters indude a depth cue z-front value, a depth cue z-back value, a depth cue scale 

value, a depth cue z-scale value, and a depth cue fade color value, such that tb^ menrK)ry control circuit 
perfomns the pixel depth cue function on each pixel, and then writes each pbcel to the frame buffer menrv 
ory. 
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4. The draw processor of daim 1 , wherein the pixel function comprises a write pixel function, the direct port 
packet containing an x and a y coordinate, and a red, a green, and a blue color value, such that the memory 
control circuit writes the red. the green and the blue color value to the frame buffer memory at a frame 
buffer memory address corresponding to the x and the y coordinates. 

5. The draw processor of claim 1. wherein the pixel function comprises a pixel block copy function, and the 
pixel function parametere include a source address, a fill size value, and a destination address, such that 
the memory control circuit reads a block of source pixels from the frame buffer memory at the source 
address and writes the block of source pwels to the frame buffer menrrory at the destination address, the 
fill size value specifying a number of pixels in the block of source pixels. 

6. A method for rendering pixels into a frame buffer in a graphics accelerator, comprising the steps of: 

receiving a draw packet over a draw bus from a floating-point processor, the draw packet containing 
a set of geometry parameters that define a geometry object, the geometry object comprising a high level 
screen space descriptton of two dimensional and three dimensional point line and area graphics primi- 
tives; 

adjusting the geometry parameters according to an interleave value corresponding the draw proo- 

esson 

generating a set of pbcels corresponding to the geometry object by performing edgewalking and 
scan interpolation functions according to the geometry parameters; 

receiving a direct port packet over the draw bus from a command preprocessor, the direct port pack- 
et containing a set of pixel function parameters that control at least one pixel function; 

writing the pixels into a frame buffer memory while performing the pixel function. 

7. The method of daim 6, wherein the draw packet comprises at least one x coordinate for the geometry 
object, and wherein the step of adjusting the geometry parameters comprises the step of adding the in- 
terleave value the the x coordinates for the geometry object. 

8. The method of daim 6, wherein the pixel f unctton comprises a pixel depth cue function, and the pixel func- 
tion parameters include a depth cue z-front value, a depth cue z-back value, a depth cue scale value, a 
depth cue z-scale value, and a depth cue fade color value, and wherein the step of performing the pixel 
function comprises the step of performing the pixel depth cue function on each pixel, and then writing 
each pixel to the frame buffer memory. 

9. The method of daim 6, wherein the pixel function comprises a write pixel function, the direct port packet 
containing an x and a y coordinate, and a red, a green, and a blue color value, and wherein the step of 
performing the pixel function comprises the step of writing the red. the green and the blue color value to 
the frame buffer memory at a frame buffer memory address corresponding to the x and the y coordinates. 

10. The method of daim 6, wherein the pixel function comprises a pixel block copy function, and the pixel 
function parameters indude a source address, a fill size value, and a destinatbn address, and wherein 
the step of performing the pbcel function comprises the steps of reading a block of source pixels from the 
frame buffer memory at the source address and writing the block of source pixels to the frame buffer menrv 
ory at the destination address, such that the fill size value specifies a number of pixels in the block of 
source pixels. 
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